22 research outputs found

    When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment

    Full text link
    Reinforcement learning (RL) algorithms face two distinct challenges: learning effective representations of past and present observations, and determining how actions influence future returns. Both challenges involve modeling long-term dependencies. The transformer architecture has been very successful to solve problems that involve long-term dependencies, including in the RL domain. However, the underlying reason for the strong performance of Transformer-based RL methods remains unclear: is it because they learn effective memory, or because they perform effective credit assignment? After introducing formal definitions of memory length and credit assignment length, we design simple configurable tasks to measure these distinct quantities. Our empirical results reveal that Transformers can enhance the memory capacity of RL algorithms, scaling up to tasks that require memorizing observations 15001500 steps ago. However, Transformers do not improve long-term credit assignment. In summary, our results provide an explanation for the success of Transformers in RL, while also highlighting an important area for future research and benchmark design

    Adaptive Agent Architecture for Real-time Human-Agent Teaming

    Get PDF
    Teamwork is a set of interrelated reasoning, actions and behaviors of team members that facilitate common objectives. Teamwork theory and experiments have resulted in a set of states and processes for team effectiveness in both human-human and agent-agent teams. However, human-agent teaming is less well studied because it is so new and involves asymmetry in policy and intent not present in human teams. To optimize team performance in human-agent teaming, it is critical that agents infer human intent and adapt their polices for smooth coordination. Most literature in human-agent teaming builds agents referencing a learned human model. Though these agents are guaranteed to perform well with the learned model, they lay heavy assumptions on human policy such as optimality and consistency, which is unlikely in many real-world scenarios. In this paper, we propose a novel adaptive agent architecture in human-model-free setting on a two-player cooperative game, namely Team Space Fortress (TSF). Previous human-human team research have shown complementary policies in TSF game and diversity in human players' skill, which encourages us to relax the assumptions on human policy. Therefore, we discard learning human models from human data, and instead use an adaptation strategy on a pre-trained library of exemplar policies composed of RL algorithms or rule-based methods with minimal assumptions of human behavior. The adaptation strategy relies on a novel similarity metric to infer human policy and then selects the most complementary policy in our library to maximize the team performance. The adaptive agent architecture can be deployed in real-time and generalize to any off-the-shelf static agents. We conducted human-agent experiments to evaluate the proposed adaptive agent framework, and demonstrated the suboptimality, diversity, and adaptability of human policies in human-agent teams.Comment: The first three authors contributed equally. In AAAI 2021 Workshop on Plan, Activity, and Intent Recognitio

    Individualized Mutual Adaptation in Human-Agent Teams

    Get PDF
    The ability to collaborate with previously unseen human teammates is crucial for artificial agents to be effective in human-agent teams (HATs). Due to individual differences and complex team dynamics, it is hard to develop a single agent policy to match all potential teammates. In this paper, we study both human-human and humanagent teams in a dyadic cooperative task, Team Space Fortress (TSF). Results show that the team performance is influenced by both players’ individual skill level and their ability to collaborate with different teammates by adopting complementary policies. Based on human-human team results, we propose an adaptive agent that identifies different human policies and assigns a complementary partner policy to optimize team performance. The adaptation method relies on a novel similarity metric to infer human policy and then selects the most complementary policy from a pre-trained library of exemplar policies. We conducted human-agent experiments to evaluate the adaptive agent and examine mutual adaptation in humanagent teams. Results show that both human adaptation and agent adaptation contribute to team performanc

    Heparan Sulfate and Chondroitin Sulfate Glycosaminoglycans Are Targeted by Bleomycin in Cancer Cells

    No full text
    Background/Aims: Bleomycin is a clinically used anti-cancer drug that produces DNA breaks once inside of cells. However, bleomycin is a positively charged molecule and cannot get inside of cells by free diffusion. We previously reported that the cell surface negatively charged glycosaminoglycans (GAGs) may be involved in the cellular uptake of bleomycin. We also observed that a class of positively charged small molecules has Golgi localization once inside of the cells. We therefore hypothesized that bleomycin might perturb Golgi-operated GAG biosynthesis. Methods: We used stable isotope labeling coupled with LC/MS analysis of GAG disaccharides simultaneously from bleomycin-treated and non-treated cancer cells. To further understand the cytotoxicity of bleomycin and its relationship to GAGs, we used sodium chlorate to inhibit GAG sulfation and commercially available GAGs to compete for cell surface GAG/bleomycin interactions in seven cell lines including CHO745 defective in both heparan sulfate and chondroitin sulfate biosynthesis. Results: we discovered that heparan sulfate GAG was significantly undersulfated and the quantity and disaccharide compositions of GAGs were changed in bleomycin-treated cells in a concentration- and time-dependent manner. We revealed that bleomycin-induced cytotoxicity was directly related to cell surface GAGs. Conclusion: GAGs were targeted by bleomycin both at cell surface and at Golgi. Thus, GAGs might be the biological relevant molecules that might be related to the bleomycin-induced fibrosis in certain cancer patients, a severe side effect with largely unknown molecular mechanism

    Patient-Derived Gastric Carcinoma Xenograft Mouse Models Faithfully Represent Human Tumor Molecular Diversity.

    No full text
    Patient-derived cancer xenografts (PDCX) generally represent more reliable models of human disease in which to evaluate a potential drugs preclinical efficacy. However to date, only a few patient-derived gastric cancer xenograft (PDGCX) models have been reported. In this study, we aimed to establish additional PDGCX models and to evaluate whether these models accurately reflected the histological and genetic diversities of the corresponding patient tumors. By engrafting fresh patient gastric cancer (GC) tissues into immune-compromised mice (SCID and/or nude mice), thirty two PDGCX models were established. Histological features were assessed by a qualified pathologist based on H&E staining. Genomic comparison was performed for several biomarkers including ERBB1, ERBB2, ERBB3, FGFR2, MET and PTEN. These biomarkers were profiled to assess gene copy number by fluorescent in situ hybridization (FISH) and/or protein expression by immunohistochemistry (IHC). All 32 PDGCX models retained the histological features of the corresponding human tumors. Furthermore, among the 32 models, 78% (25/32) highly expressed ERBB1 (EGFR), 22% (7/32) were ERBB2 (HER2) positive, 78% (25/32) showed ERBB3 (HER3) high expression, 66% (21/32) lost PTEN expression, 3% (1/32) harbored FGFR2 amplification, 41% (13/32) were positive for MET expression and 16% (5/32) were MET gene amplified. Between the PDGCX models and their parental tumors, a high degree of similarity was observed for FGFR2 and MET gene amplification, and also for ERBB2 status (agreement rate = 94~100%; kappa value = 0.81~1). Protein expression of PTEN and MET also showed moderate agreement (agreement rate = 78%; kappa value = 0.46~0.56), while ERBB1 and ERBB3 expression showed slight agreement (agreement rate = 59~75%; kappa value = 0.18~0.19). ERBB2 positivity, FGFR2 or MET gene amplification was all maintained until passage 12 in mice. The stability of the molecular profiles observed across subsequent passages within the individual models provides confidence in the utility and translational significance of these models for in vivo testing of personalized therapies

    Clinical characteristics of primary gastric carcinomas.

    No full text
    <p><sup>a</sup>: Fisher exact test</p><p><sup>b</sup>: Mann Whitney U Test</p><p><sup>c</sup>: Log Rank Test</p><p>* patients had missing information</p><p>Clinical characteristics of primary gastric carcinomas.</p
    corecore